!./maq reviews --by date --date 2018-12 --bandinfo -n 1
2019-01-03 03:08:27,072 INFO: All requests will be throttled by 0.50 second(s)... 2019-01-03 03:08:27,072 INFO: Query reviews by date 2019-01-03 03:08:27,898 INFO: Found total of 691 records for date criteria 2018-12... 2019-01-03 03:08:27,898 INFO: Found a total of 691 records for all date criteria to retrieve... [ ] N/A%2019-01-03 03:08:28,084 INFO: Retrieving 1 total records (batches = 4, batchsize = 200)... 2019-01-03 03:08:28,085 INFO: Processing reviews with the date criteria 2018-12... 2019-01-03 03:08:28,085 INFO: Fetching batch 1 of 4... 2019-01-03 03:08:28,906 INFO: Batch 1 has 200 records... [========================================================================] 100%bid,name,origin,location,status,formedin,genre,themes,label,uid,aid,album,author,review,rating,date 3540390387,Chainsaw,Greece,Athens, Attica,Active,1997,Thrash Metal,Blasphemy| Anti-Religion,Hell's Fire Records,329559,673652,Filthy Blasphemy,Felix 1666, Bass guitar players are a pitiful species. They have only four strings, they are not allowed to play solos and many vile sound engineers forget to record their contributions when it comes to a new album. So all in all, they do not see the bright side of life too many times, to say the least. But Witchkiller, the bass guitarist of the Greece wrecking crew called Chainsaw, has decided to be no part of this miserable game. Thus, the bass lines have gained a pretty prominent position on "Filthy Blasphemy", the second album of the trio. A trio? Indeed, already the number of musicians gives a first indication concerning the style of the compositions. Chainsaw feel comfortable in bad company and so they have a drink with Venom, Warfare, Motörhead, Blizzard and (early) Tank. Needless to say that this meeting takes place in a very shabby bar with sticky tables and rancid girls behind the counter who have been quite sexy during the seventies of the last century (of course, Lemmy had them all). And it goes without saying that nobody here has ever heard of any kind of hygiene regulations and so "Filthy Blasphemy" lives up to its name with great ease. Boozy, violent anthems like "Down in Vitriol", "Iron Casket" or the title track shape an album that does not lack power, malignancy and velocity. The riffing sounds fresh, the energy level leaves no wishes unfulfilled and no acoustic interludes disturb the constant aggression. Despite the "we don't give a f**k" attitude of the protagonists, the album does not suffer from a blurred production and both the precision and tight interplay of the band know how to please the ear. In terms of thrash, the Greece legions are still growing and it does not matter whether we speak about the pure old school style or the blackened approach that Chainsaw celebrate. Of course, one can say that the group serves absolutely every stereotype of the hybrid genre with tracks such as "Judas Double Trader" and its anti-religious lyrics, but who cares? The fact that the artwork sucks is also more or less completely irrelevant. As long as we have the commitment that the music is what really counts, all these minor deficiencies are blown away by the absolute highlight "Under the Hammer of Gore". Its pinpoint riff, the rapid beats of the snare, the commanding deep voice of the lead singer and the raid-like background vocals of the chorus express pure dedication and sheer brutal fun. The real essence of blackened thrash comes to life, while other songs like "Hooves (at Your Door)" emphasize the rather sordid side of their music. These tunes draw the shorter straw in comparison with the more vehement pieces, but nobody must worry about big differences in terms of quality. Did I already tell you that bass guitar players live a very interesting life (as long as they play metal)? They are part of a devastating horde, they know the value of teamwork and they spread their music all over the world. Sometimes they are even the descendants of people who made a huge contribution in the development of the European culture. Ask this guy called Witchkiller, if you don't believe me. ,80%,04:25 2019-01-03 03:08:30,295 INFO: Processed 1 records... [========================================================================] 100%
import numpy as np
import pandas as pd
import seaborn as sns
import calendar
import matplotlib.pyplot as plt
import json
import time
import wordcloud
import os
import seaborn as sns
from collections import Counter
from math import pi
%matplotlib inline
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from spacy import displacy
# Preload spaCy's large word embeddings upfront.
nlp = spacy.load('en_core_web_lg')
import sklearn.preprocessing as pr
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.utils import shuffle
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from bokeh.embed import file_html
from bokeh.resources import CDN
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import BasicTicker, ColumnDataSource, ColorBar, Title, LinearColorMapper, PrintfTickFormatter, LogColorMapper, NumeralTickFormatter, LabelSet, HoverTool
from bokeh.palettes import Category20c, Inferno256, Magma256, Greys256, Spectral6, Viridis256, linear_palette
from bokeh.transform import cumsum, linear_cmap, jitter, transform
from bokeh.layouts import row, column
output_notebook()
if os.path.exists('data/checkpoint.csv'):
df = pd.read_csv('data/checkpoint.csv')
else:
df = pd.read_csv('data/reviews.csv')
df['formedin'].fillna(0, inplace = True)
df['formedin'] = df['formedin'].apply(int)
df['formedin_dt'] = pd.to_datetime(df['formedin'], errors = 'coerce', format = '%Y')
df['rating_n'] = df['rating'].str.rstrip('%').astype('float')/100.0
df['date_dt'] = pd.to_datetime(df['date'], errors = 'coerce')
# 80% or higher are considered a positive review
POSITIVE_THRESHOLD = 0.8
def calc_sentiment(rating):
if rating >= POSITIVE_THRESHOLD:
return 1
else:
return 0
df['sentiment'] = df['rating_n'].apply(lambda x: calc_sentiment(x))
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100626 entries, 0 to 100625 Data columns (total 23 columns): bid 100626 non-null int64 name 100626 non-null object origin 100626 non-null object location 99497 non-null object status 100626 non-null object formedin 100626 non-null int64 genre 100626 non-null object themes 92045 non-null object label 76208 non-null object uid 100626 non-null int64 aid 100626 non-null int64 album 100626 non-null object author 100626 non-null object review 100626 non-null object rating 100626 non-null object date 100626 non-null object date_dt 100626 non-null datetime64[ns] formedin_dt 98958 non-null object rating_n 100626 non-null float64 genre_n 100626 non-null object review_words 100626 non-null object review_lemmas 100626 non-null object sentiment 100626 non-null int64 dtypes: datetime64[ns](1), float64(1), int64(5), object(16) memory usage: 17.7+ MB
df.head()
| bid | name | origin | location | status | formedin | genre | themes | label | uid | ... | review | rating | date | date_dt | formedin_dt | rating_n | genre_n | review_words | review_lemmas | sentiment | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 82292 | ... | \nI was introduced to this band in Metal Storm... | 85% | May 29th, 2007 | 2007-05-29 | 1999-01-01 | 0.85 | black | introduced band metal storm forums darkthrone ... | introduce band metal storm forum darkthrone cl... | 1 |
| 1 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 35007 | ... | \nLet me start by saying that the only reason ... | 88% | October 23rd, 2005 | 2005-10-23 | 1999-01-01 | 0.88 | black | let start saying reason decided check band ban... | let start reason decide check band band dumb h... | 1 |
| 2 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 49692 | ... | \nI've had experiences before where I just HAD... | 12% | February 3rd, 2006 | 2006-02-03 | 1999-01-01 | 0.12 | black | experiences check album album cover lyrical th... | experience check album album cover lyrical the... | 0 |
| 3 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 83768 | ... | \nInterestingly done, but hardly worth buying.... | 65% | April 26th, 2007 | 2007-04-26 | 1999-01-01 | 0.65 | black | interestingly hardly worth buying duo knows pr... | interestingly hardly worth buy duo know proble... | 0 |
| 4 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 82292 | ... | \nThis is maybe the tenth Turkish band I liste... | 60% | May 29th, 2007 | 2007-05-29 | 1999-01-01 | 0.60 | black | maybe tenth turkish band listened best far esp... | maybe tenth turkish band listen best far espec... | 0 |
5 rows × 23 columns
sample = df.sample(1)
print(sample.review.values[0], sample.rating.values[0])
People are prone and welcome to argue, but I don't think there was another band to come out of the Norwegian Second Wave that was as great as Emperor. Whereas other black metal acts of the time often tried to convey that tense miasma via chainsaw production or an explicit focus on atmosphere, Emperor assaulted the listener with aggressive technique and dissonant finesse, the likes of which would prove (in my opinion) to be far more interesting musically than most of their more primal contemporaries. Technique and dissonance in black metal have since been taken to their natural conclusions by bands like Deathspell Omega, but there's still something special about Emperor's style. I hear strong currents of that sound in Paimonia, a far more recent outfit from Serbia. The same technical skill, chaotic aggression and atmosphere are here in full on Disease Named Humanity, and while I'm certain a more forward-thinking approach would have done more to impress me, Paimonia are off to a strong start with this debut. Although Emperor are undoubtedly the central influence for Paimonia, comparisons might be made to others in the Scandinavian canon. The melodic phrasing and effective chord progressions of Dissection come readily to mind, although Paimonia often let technique and reverence of the unholy (and done to death) tritone dictate their songwriting. Nikola Pacek-Vetnic is listed as a full-time drummer for Paimonia, but it's essentially the brainchild of Bojan Vukoman, who plays guitar, bass and just about everything else on Disease Named Humanity. As it so happens, Vukoman is an excellent guitarist. He's found a strong balance of clarity and viciousness in the guitar tone, and the riffs are challenging. As a vocalist, the influence of Emperor is even more apparent; his voice sounds an injured animal is howling through a layer of phlegm. Vukoman definitely seems to have mirrored himself in the image of Ihsahn on Disease Named Humanity. He does, however, do it excellently on all fronts. The guitars lean towards the same biting treble as we've heard in the genre's past, although the clear production does seem to give the album a modern feel. Dissonance was clearly a big keyword when the album was on the drawing board, although the abrasiveness is kept on a short leash in favour of keeping things clear. Although the derivative style might curtail Paimonia's potential overall, it's the songwriting that feels weakest here. Disease Named Humanity starts off on a strong note with "As Plague Scourge This World Apart", but virtually every song thereafter becomes less impressive. A notable exception to this is "Depth Within Nothingness Called Life", which breathes some fresh life into the music with a violin arrangement. Barring that, I don't think the songs on the album get progressively worse so much as the listener becomes dreadfully accustomed to the small bag of tricks Paimonia offers in the writing. Especially given the fact that most of us listening have heard this tricks executed countless times before with the Second Wave classics, it's pretty difficult to stay attentive by the end of the album. Paimonia may struggle with finding an identity of their own, but it doesn't dissuade the fact that Disease Named Humanity is an excellently executed album that willingly takes on the challenge of continuing the style of one of the genre's best acts. Given the shallow palette Paimonia are currently offering with regards to songwriting, Disease Named Humanity can be as frustrating at times as it is impressive, and it is an impressive album. It's just clear that Paimonia has some work to do before their vision is up to par with the way they deliver it. Originally written for Heathen Harvest Periodical 64%
countries = df.groupby('origin')['name'].count()
countries = countries.reset_index().rename(columns={'name' : 'count'})
countries.head()
| origin | count | |
|---|---|---|
| 0 | Afghanistan | 1 |
| 1 | Albania | 1 |
| 2 | Algeria | 16 |
| 3 | Andorra | 30 |
| 4 | Angola | 1 |
ratings = df.groupby('origin')['rating_n'].mean().reset_index()
ratings.head()
| origin | rating_n | |
|---|---|---|
| 0 | Afghanistan | 0.590000 |
| 1 | Albania | 0.970000 |
| 2 | Algeria | 0.576250 |
| 3 | Andorra | 0.874667 |
| 4 | Angola | 0.720000 |
dfr = ratings.set_index('origin')
dfr.head()
| rating_n | |
|---|---|
| origin | |
| Afghanistan | 0.590000 |
| Albania | 0.970000 |
| Algeria | 0.576250 |
| Andorra | 0.874667 |
| Angola | 0.720000 |
len(df.origin.unique())
135
np.sort(df.origin.unique())
array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan',
'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
'Belize', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana',
'Brazil', 'Brunei', 'Bulgaria', 'Canada', 'Chile', 'China',
'Colombia', 'Costa Rica', 'Croatia', 'Cuba', 'Curaçao', 'Cyprus',
'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador',
'Egypt', 'El Salvador', 'Estonia', 'Ethiopia', 'Faroe Islands',
'Finland', 'France', 'Georgia', 'Germany', 'Gibraltar', 'Greece',
'Greenland', 'Guatemala', 'Guernsey', 'Honduras', 'Hong Kong',
'Hungary', 'Iceland', 'India', 'Indonesia', 'International',
'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan',
'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Korea, South',
'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon',
'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macedonia (FYROM)',
'Malaysia', 'Maldives', 'Malta', 'Mexico', 'Moldova', 'Monaco',
'Mongolia', 'Montenegro', 'Morocco', 'Namibia', 'Nepal',
'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua',
'Norway', 'Oman', 'Pakistan', 'Panama', 'Paraguay', 'Peru',
'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
'Romania', 'Russia', 'Saudi Arabia', 'Serbia', 'Singapore',
'Slovakia', 'Slovenia', 'South Africa', 'Spain', 'Sri Lanka',
'Svalbard', 'Sweden', 'Switzerland', 'Syria', 'Taiwan',
'Tajikistan', 'Thailand', 'Trinidad and Tobago', 'Tunisia',
'Turkey', 'Turkmenistan', 'Uganda', 'Ukraine',
'United Arab Emirates', 'United Kingdom', 'United States',
'Unknown', 'Uruguay', 'Uzbekistan', 'Venezuela', 'Vietnam',
'Åland Islands'], dtype=object)
countries.head()
| origin | count | |
|---|---|---|
| 0 | Afghanistan | 1 |
| 1 | Albania | 1 |
| 2 | Algeria | 16 |
| 3 | Andorra | 30 |
| 4 | Angola | 1 |
dfc = countries.set_index('origin')
dfc['count'].max(), dfc['count'].min()
(29745, 1)
dfr['rating_n'].max(), dfr['rating_n'].min()
(0.97, 0.3)
from bokeh.io import output_notebook, show
from bokeh.models import GeoJSONDataSource, HoverTool, LinearColorMapper
from bokeh.palettes import inferno
from bokeh.plotting import figure
from bokeh.sampledata.sample_geojson import geojson
import requests
import json
# From GitHub, a random geojson file; pulling the raw string using requests
geojson = open('data/countries-hires.json', 'r', encoding='latin-1').read()
# Add 'count' to JSON so map can reference it
dfc = countries.set_index('origin')
js = json.loads(geojson)
features = js['features']
for feature in features:
name = feature['properties']['NAME']
if name in dfc.index:
feature['properties']['COUNT'] = str(dfc.loc[name]['count'])
feature['properties']['MEAN_RATING'] = str(dfr.loc[name]['rating_n']*100)
else:
feature['properties']['COUNT'] = str(0)
feature['properties']['MEAN_RATING'] = str(0)
geo_source = GeoJSONDataSource(geojson=json.dumps(js))
mapper = LinearColorMapper(palette=Inferno256[:len(dfr)], low=dfr['rating_n'].min(), high=dfr['rating_n'].max())
p1 = figure(height=700, width=1000, tools="", toolbar_location=None)
p1.patches('xs', 'ys', fill_alpha=0.7, fill_color={'field': 'MEAN_RATING', 'transform': mapper},
line_color='white', line_width=0.5, source=geo_source)
p1.add_layout(Title(text="Average Rating by Band Location World Map", align="center"), "below")
p1.grid.grid_line_color = None
p1.xaxis.visible = False
p1.yaxis.visible = False
p1.xgrid.visible = False
p1.ygrid.visible = False
# Getting the Hovertool, and adding State name as output
p1.add_tools(HoverTool(
tooltips=[
("Country: ", "@NAME"),
("Avg. Rating: ", "@MEAN_RATING{(00.0)}%"),
("Reviews: ", "@COUNT"),
],
formatters={
"Avg. Rating: ": "numeral",
"Reviews: ": "numeral",
}
))
mapper = LinearColorMapper(palette=Inferno256[:len(dfc)], low=dfc['count'].min(), high=dfc['count'].max())
p2 = figure(height=700, width=1000, tools="", toolbar_location=None)
p2.patches('xs', 'ys', fill_alpha=0.7, fill_color={'field': 'COUNT', 'transform': mapper},
line_color='white', line_width=0.5, source=geo_source)
p2.add_layout(Title(text="Average Number of Reviews by Band Location World Map", align="center"), "below")
p2.grid.grid_line_color = None
p2.xaxis.visible = False
p2.yaxis.visible = False
p2.xgrid.visible = False
p2.ygrid.visible = False
# Getting the Hovertool, and adding State name as output
p2.add_tools(HoverTool(
tooltips=[
("Country: ", "@NAME"),
("Avg. Rating: ", "@MEAN_RATING{(00.0)}%"),
("Reviews: ", "@COUNT"),
],
formatters={
"Avg. Rating: ": "numeral",
"Reviews: " : "numeral"
}
))
show(p1)
show(p2)
html = file_html(p1, CDN, "avgratingsbandsworldmap-plot")
with open('data/avgratingsbandsworldmap-plot.html', 'w') as f:
f.write(html)
html = file_html(p2, CDN, "avgnumbandsworldmap-plot")
with open('data/avgnumbandsworldmap-plot.html', 'w') as f:
f.write(html)
adf = df.copy()
adf['review_year'] = adf['date_dt'].dt.year
adf['review_month'] = adf['date_dt'].dt.month
adf.head()
| bid | name | origin | location | status | formedin | genre | themes | label | uid | ... | date | date_dt | formedin_dt | rating_n | genre_n | review_words | review_lemmas | sentiment | review_year | review_month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 82292 | ... | May 29th, 2007 | 2007-05-29 | 1999-01-01 | 0.85 | black | introduced band metal storm forums darkthrone ... | introduce band metal storm forum darkthrone cl... | 1 | 2007 | 5 |
| 1 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 35007 | ... | October 23rd, 2005 | 2005-10-23 | 1999-01-01 | 0.88 | black | let start saying reason decided check band ban... | let start reason decide check band band dumb h... | 1 | 2005 | 10 |
| 2 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 49692 | ... | February 3rd, 2006 | 2006-02-03 | 1999-01-01 | 0.12 | black | experiences check album album cover lyrical th... | experience check album album cover lyrical the... | 0 | 2006 | 2 |
| 3 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 83768 | ... | April 26th, 2007 | 2007-04-26 | 1999-01-01 | 0.65 | black | interestingly hardly worth buying duo knows pr... | interestingly hardly worth buy duo know proble... | 0 | 2007 | 4 |
| 4 | 33438 | ...Aaaarrghh... | Turkey | Ankara | Unknown | 1999 | Black Metal | The Agony of the Afterlife | NaN | 82292 | ... | May 29th, 2007 | 2007-05-29 | 1999-01-01 | 0.60 | black | maybe tenth turkish band listened best far esp... | maybe tenth turkish band listen best far espec... | 0 | 2007 | 5 |
5 rows × 25 columns
ddf = adf.groupby(['review_year', 'review_month'])['review'].count().reset_index()
ddf = ddf.rename(columns={'review_year': 'Year', 'review_month': 'Month', 'review': 'Count'})
ddf.head()
| Year | Month | Count | |
|---|---|---|---|
| 0 | 2002 | 7 | 35 |
| 1 | 2002 | 8 | 310 |
| 2 | 2002 | 9 | 42 |
| 3 | 2002 | 10 | 64 |
| 4 | 2002 | 11 | 72 |
cdf = adf.groupby(['review_year', 'review_month'])['rating_n'].mean().reset_index()
cdf = cdf.rename(columns={'review_year': 'Year', 'review_month': 'Month', 'rating_n': 'Rating'})
cdf['Year'] = cdf['Year'].apply(lambda x: str(x))
cdf['Count'] = ddf['Count']
cdf.head()
| Year | Month | Rating | Count | |
|---|---|---|---|---|
| 0 | 2002 | 7 | 0.829429 | 35 |
| 1 | 2002 | 8 | 0.772742 | 310 |
| 2 | 2002 | 9 | 0.859048 | 42 |
| 3 | 2002 | 10 | 0.826250 | 64 |
| 4 | 2002 | 11 | 0.780417 | 72 |
acdf = cdf.pivot("Year", "Month", "Rating").fillna(0)
acdf.index = acdf.index.map(str)
acdf.head()
| Month | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | ||||||||||||
| 2002 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.829429 | 0.772742 | 0.859048 | 0.826250 | 0.780417 | 0.766739 |
| 2003 | 0.794586 | 0.788333 | 0.812756 | 0.758246 | 0.798598 | 0.800460 | 0.840994 | 0.810782 | 0.839610 | 0.821250 | 0.827163 | 0.799866 |
| 2004 | 0.810332 | 0.825000 | 0.812195 | 0.821558 | 0.815828 | 0.803860 | 0.817981 | 0.791423 | 0.812903 | 0.808825 | 0.813662 | 0.808625 |
| 2005 | 0.822619 | 0.801692 | 0.826941 | 0.810052 | 0.829125 | 0.817249 | 0.817197 | 0.829545 | 0.806301 | 0.811946 | 0.796417 | 0.831208 |
| 2006 | 0.818533 | 0.813261 | 0.769806 | 0.814383 | 0.819020 | 0.809942 | 0.819776 | 0.802465 | 0.801928 | 0.817741 | 0.817387 | 0.817363 |
cdf.head()
| Year | Month | Rating | Count | |
|---|---|---|---|---|
| 0 | 2002 | 7 | 0.829429 | 35 |
| 1 | 2002 | 8 | 0.772742 | 310 |
| 2 | 2002 | 9 | 0.859048 | 42 |
| 3 | 2002 | 10 | 0.826250 | 64 |
| 4 | 2002 | 11 | 0.780417 | 72 |
acdf = acdf.rename(columns=dict(zip(range(13), calendar.month_abbr)))
acdf.head()
| Month | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | ||||||||||||
| 2002 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.829429 | 0.772742 | 0.859048 | 0.826250 | 0.780417 | 0.766739 |
| 2003 | 0.794586 | 0.788333 | 0.812756 | 0.758246 | 0.798598 | 0.800460 | 0.840994 | 0.810782 | 0.839610 | 0.821250 | 0.827163 | 0.799866 |
| 2004 | 0.810332 | 0.825000 | 0.812195 | 0.821558 | 0.815828 | 0.803860 | 0.817981 | 0.791423 | 0.812903 | 0.808825 | 0.813662 | 0.808625 |
| 2005 | 0.822619 | 0.801692 | 0.826941 | 0.810052 | 0.829125 | 0.817249 | 0.817197 | 0.829545 | 0.806301 | 0.811946 | 0.796417 | 0.831208 |
| 2006 | 0.818533 | 0.813261 | 0.769806 | 0.814383 | 0.819020 | 0.809942 | 0.819776 | 0.802465 | 0.801928 | 0.817741 | 0.817387 | 0.817363 |
cdf['Month'] = cdf.Month.apply(lambda x: calendar.month_abbr[x] if type(x) == int else x)
cdf.head()
| Year | Month | Rating | Count | |
|---|---|---|---|---|
| 0 | 2002 | Jul | 0.829429 | 35 |
| 1 | 2002 | Aug | 0.772742 | 310 |
| 2 | 2002 | Sep | 0.859048 | 42 |
| 3 | 2002 | Oct | 0.826250 | 64 |
| 4 | 2002 | Nov | 0.780417 | 72 |
rdf = adf.groupby('review_year')['rating_n'].mean().reset_index()
rdf['review_year'] = rdf['review_year'].apply(str)
rdf['count'] = adf.groupby('review_year')['review'].count().reset_index().rename(columns={'review':'count'})['count']
source = ColumnDataSource(cdf)
mapper = LinearColorMapper(palette=Inferno256, low=cdf['Rating'].min(), high=cdf['Rating'].max())
color_bar = ColorBar(color_mapper=mapper, location=(0, 0),
ticker=BasicTicker(desired_num_ticks=7),
formatter=PrintfTickFormatter(format="%0.2f"))
months = list(acdf.columns)
years = list(acdf.index)
p1 = figure(plot_width=500, plot_height=500, toolbar_location=None,
x_range=years, y_range=list(reversed(months)),
tools="", x_axis_location="above")
p1.rect(x="Year", y="Month", width=1, height=1, source=source,
line_color=None, fill_color=transform('Rating', mapper))
p1.add_layout(color_bar, 'right')
p1.add_layout(Title(text="Average Ratings Through The Years Heatmap", align="center"), "below")
p1.add_tools(HoverTool(
tooltips=[
("Count: ", "@Count"),
("Average: ", "@Rating"),
],
))
p1.axis.axis_line_color = None
p1.axis.major_tick_line_color = None
p1.axis.major_label_text_font_size = "10pt"
p1.axis.major_label_standoff = 0
p1.xaxis.major_label_orientation = 1.0
p1.xgrid.visible = False
p1.ygrid.visible = False
source = ColumnDataSource(rdf)
p2 = figure(plot_width=500, plot_height=250, x_range=list(rdf['review_year']), toolbar_location=None)
p2.add_tools(HoverTool(tooltips="@rating_n"))
p2.add_layout(Title(text="Average Ratings Through the Years", align="center"), "below")
p2.line(x='review_year', y='rating_n', color='firebrick', source=source)
p2.xgrid.visible = False
p2.ygrid.visible = False
p2.xaxis.major_label_orientation = pi/4
p3 = figure(plot_width=500, plot_height=250, x_range=list(rdf['review_year']), toolbar_location=None)
p3.add_tools(HoverTool(tooltips="@count"))
p3.add_layout(Title(text="Number of Reviews Through the Years", align="center"), "below")
p3.line(x='review_year', y='count', color='orange', source=source)
p3.xgrid.visible = False
p3.ygrid.visible = False
p3.xaxis.major_label_orientation = pi/4
fig = row(p1, column(p2, p3))
show(fig)
html = file_html(fig, CDN, "heatmap-plot")
with open('data/heatmap-plot.html', 'w') as f:
f.write(html)
len(df['genre'].unique())
3467
df.groupby('genre')['review'].count()
genre
AOR (early)| Power/Progressive Metal (later) 7
Acoustic Folk| Raw Black Metal 11
Acoustic Rock (early)| Progressive Metal (later) 3
Acoustic/Folk 4
Alternative Metal (early)| Crossover/Grunge (later) 1
Alternative Rock/Power Metal with Electronic Influences 22
Alternative/Art Rock (early)| Heavy Metal (later) 2
Alternative/Groove Metal 1
Alternative/Grunge/Metal 4
Alternative/Heavy Metal 1
Alternative/Melodic Death/Thrash Metal 2
Alternative/Progressive Metal 3
Ambient 1
Ambient (early)| Black Metal (later) 2
Ambient (early)| Depressive Black Metal (later) 2
Ambient (early)| Raw Black Metal (later) 1
Ambient (early)| Symphonic Black Metal/Ambient (later) 14
Ambient (early)| Symphonic Metal (later) 1
Ambient Black Metal 95
Ambient Black Metal (early)| Ambient Black Metal with Industrial influences (later) 2
Ambient Black Metal (early)| Atmospheric/Post-Black Metal (later) 1
Ambient Black Metal (early)| Progressive Rock (later) 2
Ambient Black Metal/Post-Rock/Folk 9
Ambient Black Metal| Drone Doom 29
Ambient Black/Death Metal 1
Ambient Black/Doom Metal 4
Ambient Black/Thrash Metal 8
Ambient Drone (early)| Atmospheric Funeral Doom Metal (later) 1
Ambient Drone/Black Metal 3
Ambient Drone/Doom Metal 116
...
Thrash/Speed/Power Metal 9
Thrash/Stoner Metal 3
Thrash/Stoner/Sludge Metal 1
Various 332
Viking Black Metal 7
Viking Black/Thrash Metal 2
Viking Death Metal 1
Viking Metal 59
Viking Metal (early)| Black Metal (later) 3
Viking Metal/RAC 1
Viking/Black Metal 77
Viking/Black Metal (early)| Post-Metal/Rock (later) 33
Viking/Black Metal with Folk Influences 2
Viking/Black Metal| Neofolk 9
Viking/Black/Folk Metal 71
Viking/Death Metal 1
Viking/Death Metal (early)| Melodic Death Metal (later) 1
Viking/Doom/Gothic Metal 2
Viking/Epic Black Metal 1
Viking/Epic Power Metal 1
Viking/Folk Black Metal 1
Viking/Folk Metal 59
Viking/Folk Metal (early)| Indie/Folk Rock (later) 10
Viking/Folk Metal| Progressive Metal 37
Viking/Folk/Black Metal 4
Viking/Folk/Melodic Black Metal 2
Viking/Heavy Metal 1
Viking/Melodic Death Metal 1
Viking/Power Metal 5
Yiddish Folk Metal 2
Name: review, Length: 3467, dtype: int64
# We will standardize on the first word genre specific word in the 'genre' column.
# All of this is *very* subjective but given my 25+ years of listening to metal
# as well as writing about it online I can confidently say that the genres that do
# result from this function are fairly inline with what most fans and critics consider
# are the major metal sub-genres.
def parseGenre(genre):
# Take last genre the band is classified in as the defacto one.
if '|' in genre:
genres = genre.split('|')[-1]
else:
genres = genre
genres = genres.replace('/', ' ').lower().strip().split()
# Get rid of all non-genre specific words. This should leave us with the bare
# essentials that make up a "trve" genre.
remove_list = ['art', 'film', 'urban', 'ritual', 'powerviolence', 'celtic', 'middle', 'eastern', 'with',
'egyptian', 'jazz', 'shoegaze', 'orchestral', 'fusion', 'martial', 'and', 'bass', 'dance',
'horror', 'acoustic', 'post-', 'yiddish', 'opera', 'operatic', 'minimalistic', 'dub', 'indie',
'(early)', '(mid)', '(late)', '(later)', 'oi!', 'blackened', 'hard', 'atmospheric', 'harsh',
'trip', 'hop', 'raw', 'dark', 'extreme', 'melodic', 'epic', 'brutal', 'technical', 'electro',
'electronica', 'depressive', 'experimental', 'funeral', 'medieval', 'metal', 'electronic',
'southern', 'funk', 'score', 'drum', 'space', 'flamenco', 'techno', 'darkwave', 'classical',
'crossover', 'influences', 'neoclassical', 'world', 'music', 'glam', 'pop', 'grunge', 'crustcore',
'thrashcore', 'crust', 'ebm', 'psychedelic', 'blues', 'avant-garde'
]
glist = [x for x in genres if x not in remove_list]
# Map a few very specific sub-genres into their general niche. For example, there is really no good reason
# to have a 'neofolk' vs 'folk' genre category. Just zonk both under 'folk'.
#
# Same goes with the 'post-' derviatives, etc. Just label them all as post.
genre_map = {
'shred': 'thrash',
'speed': 'thrash',
'aor': 'rock',
'rac': 'rock',
'post-black': 'post-metal',
'post-hardcore': 'post-metal',
'post-rock': 'post-metal',
'post': 'post-metal',
'goregrind': 'grind',
'neofolk': 'folk',
'djent': 'progressive',
'stoner': 'doom',
'deathcore': 'death',
'grindcore': 'grind',
'nwobhm': 'heavy',
'sludge': 'doom',
'pagan': 'viking',
'noise': 'drone',
}
# Figure out a few special cases
if not glist:
return 'various'
elif glist[0] in genre_map:
return genre_map[glist[0]]
else:
return glist[0]
df['genre_n'] = df['genre'].apply(lambda x: parseGenre(x))
gdf = df.groupby('genre_n')['review'].count().reset_index().rename(columns={'review':'count'})
gdf = gdf.sort_values(by='count')
print(df.genre_n.unique())
print(len(df.genre_n.unique()))
['black' 'industrial' 'death' 'progressive' 'doom' 'gothic' 'power' 'metalcore' 'post-metal' 'symphonic' 'rock' 'heavy' 'thrash' 'drone' 'groove' 'ambient' 'viking' 'grind' 'various' 'folk' 'hardcore' 'alternative' 'punk' 'nu-metal' 'synthwave'] 25
gdf['angle'] = gdf['count']/gdf['count'].sum() * 2*pi
total_palete = Inferno256[:128] + Viridis256[128:]
colors = list(reversed(linear_palette(total_palete, n=len(gdf))))
gdf['color'] = colors
source = ColumnDataSource(gdf)
p = figure(plot_height=700, plot_width=1000, tools="hover",
toolbar_location=None, tooltips="@genre_n: @count")
p.wedge(x=0, y=1, radius=0.6,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color='white', fill_color='color', legend='genre_n', source=source)
p.add_layout(Title(text="Number of Reviews by Genre", align="center"), "below")
p.axis.visible=False
p.grid.grid_line_color = None
show(p)
html = file_html(p, CDN, "genres-plot")
with open('data/genres-plot.html', 'w') as f:
f.write(html)
hdf = df.groupby('genre_n')['rating_n'].mean().reset_index()
hdf = hdf.sort_values(by='rating_n')
print(hdf.head())
print(hdf.tail())
genre_n rating_n
14 nu-metal 0.527714
0 alternative 0.680185
13 metalcore 0.682284
9 groove 0.689954
5 drone 0.721308
genre_n rating_n
11 heavy 0.782310
16 power 0.785444
6 folk 0.792320
17 progressive 0.804820
4 doom 0.807966
genres = hdf['genre_n']
ratings = hdf['rating_n']
hdf['color'] = Inferno256[:len(hdf)]
cds = ColumnDataSource(hdf)
p = figure(x_range=genres, plot_width=1000, plot_height=500, tools="hover",
toolbar_location=None, tooltips="@genre_n: @rating_n")
p.vbar(x='genre_n', width=0.9, top='rating_n', fill_color='color', source=cds)
p.line(x=genres, y=ratings, color="firebrick", line_width=4)
p.add_layout(Title(text="Average Rating by Genre", align="center"), "below")
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_color = None
show(p)
html = file_html(p, CDN, "avgratingbygenre-plot")
with open('data/avgratingbygenre-plot.html', 'w') as f:
f.write(html)
df[df['formedin'] != 0].sort_values(by='formedin').head(1)
| bid | name | origin | location | status | formedin | genre | themes | label | uid | ... | review | rating | date | date_dt | formedin_dt | rating_n | genre_n | review_words | review_lemmas | sentiment | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 80105 | 15424 | Scorpions | Germany | Sarstedt, Lower Saxony | Active | 1964 | Heavy Metal/Hard Rock | Life| Society| Love| Sex| Inner struggles| Rock | EMI | 81007 | ... | \nThis was the first Scorpions album that I go... | 85% | May 15th, 2018 | 2018-05-15 | 1964-01-01 | 0.85 | heavy | scorpions album got converted devout fan band ... | scorpion album convert devout fan band convers... | 1 |
1 rows × 23 columns
entry = df[df['author'] == 'autothrall'].sample(1)
doc = nlp(entry['review'].values[0])
#sentence_spans = list(doc.sents)
#displacy.render(sentence_spans, style='dep',jupyter=True)
doc.user_data['title'] = entry['album'].values[0] + " - " + entry['name'].values[0] + " by " + entry['author'].values[0] + "( " + entry['rating'].values[0] + ") "
displacy.render(doc, style='ent',jupyter=True)
adf = df.copy()
adf['review_year'] = adf['date_dt'].dt.year
adf = adf.groupby('review_year')['review_wordcnt'].mean().reset_index().rename({'review_wordcnt': 'wordcnt', 'review_year': 'year'}, axis=1)
adf['year'] = adf['year'].apply(str)
p = figure(plot_width=1000, plot_height=500, x_range=list(adf['year']), toolbar_location=None)
p.add_tools(HoverTool(tooltips="@wordcnt"))
p.line(x='year', y='wordcnt', color='black', source=ColumnDataSource(adf))
p.add_layout(Title(text="Average Number of Words Per Review Through the Years", align="center"), "below")
p.xgrid.visible = False
p.ygrid.visible = False
p.xaxis.major_label_orientation = pi/4
show(p)
WC_STOP = ['album', 'albums', 'metal', 'music', 'vocals', 'sound', 'actually', 'doom',
'sounds', 'bands', 'band', 'song', 'songs', 'black', 'death',
'track', 'riffs', 'tracks', 'record', 'find', 'power', 'things', 'demo',
'thrash', 'know', 'work', 'makes', 'listen', 'release', 'title' 'end',
'debut', 'genre', 'heavy', 'nt', 'way']
aword_freq = word_freq
for word in WC_STOP:
del aword_freq[word]
wc = wordcloud.WordCloud(width=1000, height=500)
wc.fit_words(aword_freq)
plt.figure(figsize=(30,10), facecolor='k')
plt.axis("off")
plt.imshow(wc, interpolation="bilinear")
plt.show()
plt.savefig('wordcloud.png')
<matplotlib.figure.Figure at 0x7f78acc4b278>
import re
# Text pipeline loosely based on a KDNuggets article.
#
# Source: https://www.kdnuggets.com/2018/08/practitioners-guide-processing-understanding-text-2.html
CONTRACTION_MAP = {
"ain't": "is not",
"aren't": "are not",
"can't": "cannot",
"can't've": "cannot have",
"'cause": "because",
"could've": "could have",
"couldn't": "could not",
"couldn't've": "could not have",
"didn't": "did not",
"doesn't": "does not",
"don't": "do not",
"hadn't": "had not",
"hadn't've": "had not have",
"hasn't": "has not",
"haven't": "have not",
"he'd": "he would",
"he'd've": "he would have",
"he'll": "he will",
"he'll've": "he he will have",
"he's": "he is",
"how'd": "how did",
"how'd'y": "how do you",
"how'll": "how will",
"how's": "how is",
"I'd": "I would",
"I'd've": "I would have",
"I'll": "I will",
"I'll've": "I will have",
"I'm": "I am",
"I've": "I have",
"i'd": "i would",
"i'd've": "i would have",
"i'll": "i will",
"i'll've": "i will have",
"i'm": "i am",
"i've": "i have",
"isn't": "is not",
"it'd": "it would",
"it'd've": "it would have",
"it'll": "it will",
"it'll've": "it will have",
"it's": "it is",
"let's": "let us",
"ma'am": "madam",
"mayn't": "may not",
"might've": "might have",
"mightn't": "might not",
"mightn't've": "might not have",
"must've": "must have",
"mustn't": "must not",
"mustn't've": "must not have",
"needn't": "need not",
"needn't've": "need not have",
"o'clock": "of the clock",
"oughtn't": "ought not",
"oughtn't've": "ought not have",
"shan't": "shall not",
"sha'n't": "shall not",
"shan't've": "shall not have",
"she'd": "she would",
"she'd've": "she would have",
"she'll": "she will",
"she'll've": "she will have",
"she's": "she is",
"should've": "should have",
"shouldn't": "should not",
"shouldn't've": "should not have",
"so've": "so have",
"so's": "so as",
"that'd": "that would",
"that'd've": "that would have",
"that's": "that is",
"there'd": "there would",
"there'd've": "there would have",
"there's": "there is",
"they'd": "they would",
"they'd've": "they would have",
"they'll": "they will",
"they'll've": "they will have",
"they're": "they are",
"they've": "they have",
"to've": "to have",
"wasn't": "was not",
"we'd": "we would",
"we'd've": "we would have",
"we'll": "we will",
"we'll've": "we will have",
"we're": "we are",
"we've": "we have",
"weren't": "were not",
"what'll": "what will",
"what'll've": "what will have",
"what're": "what are",
"what's": "what is",
"what've": "what have",
"when's": "when is",
"when've": "when have",
"where'd": "where did",
"where's": "where is",
"where've": "where have",
"who'll": "who will",
"who'll've": "who will have",
"who's": "who is",
"who've": "who have",
"why's": "why is",
"why've": "why have",
"will've": "will have",
"won't": "will not",
"won't've": "will not have",
"would've": "would have",
"wouldn't": "would not",
"wouldn't've": "would not have",
"y'all": "you all",
"y'all'd": "you all would",
"y'all'd've": "you all would have",
"y'all're": "you all are",
"y'all've": "you all have",
"you'd": "you would",
"you'd've": "you would have",
"you'll": "you will",
"you'll've": "you will have",
"you're": "you are",
"you've": "you have"
}
import unicodedata
def remove_accented_chars(text):
text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('utf-8', 'ignore')
return text
def expand_contractions(text, contraction_mapping=CONTRACTION_MAP):
contractions_pattern = re.compile('({})'.format('|'.join(contraction_mapping.keys())),
flags=re.IGNORECASE|re.DOTALL)
def expand_match(contraction):
match = contraction.group(0)
first_char = match[0]
expanded_contraction = contraction_mapping.get(match)\
if contraction_mapping.get(match)\
else contraction_mapping.get(match.lower())
expanded_contraction = first_char+expanded_contraction[1:]
return expanded_contraction
expanded_text = contractions_pattern.sub(expand_match, text)
expanded_text = re.sub("'", "", expanded_text)
return expanded_text
def get_words(text, lemmatize=False, keepstopwords=False):
tokens = []
doc = nlp(text)
for token in doc:
if token.is_punct:
continue
if not keepstopwords and token in STOP_WORDS:
continue
if lemmatize:
if token.lemma_ == '-PRON-':
tokens.append(token.text.lower())
else:
tokens.append(token.lemma_)
else:
tokens.append(token.text.lower())
tokens = [tok.replace("\n", "").replace("\r", "") for tok in tokens]
while "" in tokens:
tokens.remove("")
return tokens
def process_text(text, lemmatize=True, keepstopwords=False):
text = remove_accented_chars(text)
text = expand_contractions(text)
words = get_words(text, lematize=lematize, keepstopwords=keepstopwords)
return words
def calc_wordcnt(words):
return len(words.split())
df['review_wordcnt'] = df['review'].apply(lambda x: calc_wordcnt(x))
df['review_wordcnt'].describe()
count 100626.000000 mean 550.413273 std 273.531814 min 68.000000 25% 368.000000 50% 500.000000 75% 668.000000 max 8898.000000 Name: review_wordcnt, dtype: float64
html = file_html(p, CDN, "avgwordsbyyear-plot")
with open('data/avgwordsbyyear-plot.html', 'w') as f:
f.write(html)
adf = df.copy()
adf = adf.groupby('sentiment')['review_wordcnt'].mean().reset_index()
adf
| sentiment | review_wordcnt | |
|---|---|---|
| 0 | 0 | 540.558538 |
| 1 | 1 | 556.919847 |
import multiprocessing as mp
import os
# Get words in review
def process_text_batch(batch, reviews):
for i, row in batch.iterrows():
text = process_text(row.review)
reviews[i] = ' '.join(text)
# Do them in batches
batch_size = 100
batches = []
start = 0
for start in range(0, len(df), batch_size):
if len(df) - start < batch_size:
batches.append(df.iloc[start:])
else:
batches.append(df.iloc[start:start+batch_size])
# Allocate half the number of cores per batch
n = ncores = os.cpu_count()//2
print("Distributing batch work over {0} cores...".format(ncores))
batch_results = []
processes = []
batch_num = 1
with mp.Manager() as manager:
reviews = manager.list(range(len(df)))
for b in range(len(batches)):
processes.append(mp.Process(target=process_text_batch, args=(batches[b], reviews)))
if n == 1 or (len(batches) - b) < ncores:
now = time.time()
print("Starting batch {0} of {1} processing now...".format(batch_num, len(batches)//ncores))
for p in processes:
p.start()
for p in processes:
p.join()
end = time.time()
print("Finished batch {0} in {1} seconds...".format(batch_num, end-now))
n = ncores
processes = []
batch_num += 1
n -= 1
for r in reviews:
batch_results.append(r)
print("Finished processing all batch sets...")
df['review_words'] = batch_results
df['review_lemmas'] = df['review_words'].apply(lambda x: ' '.join(get_words(x, lemmatize=True)))
df.to_csv('data/checkpoint.csv', index=False)
word_freq = Counter()
r = df['review_words'].apply(lambda x: word_freq.update(x.split()))
word_freq.most_common(25)
[('album', 417421),
('metal', 377894),
('like', 266759),
('band', 218228),
('song', 201028),
('songs', 166185),
('sound', 153677),
('music', 143891),
('vocals', 136518),
('black', 128959),
('death', 128849),
('bands', 126453),
('good', 125131),
('track', 116393),
('riffs', 112948),
('guitar', 112406),
('time', 107538),
('nt', 91669),
('great', 82910),
('tracks', 78169),
('way', 75093),
('albums', 72057),
('best', 69990),
('sounds', 67814),
('bass', 67732)]
df['review_lemmas'] = df['review_words'].apply(lambda x: ' '.join(get_words(x, lemmatize=True)))
print(df['review'][0])
print('-----------------')
print(df['review_lemmas'][0])
I was introduced to this band in Metal Storm forums as a Darkthrone clone, met them again at other places as Darkthorne copycats, and see the same approach here as a Darkthrone copy. It is a pity to evaluate this band over a few songs skipped forward quickly, for they are absolutely no Darkthrone copy, copycat, clone or whatever else a band is capable of becoming. They are much more to it. The riffs are way more energetic and vivid than Darkthrone’s, and the overall music is absolutely way too fast and catchy. Unlike their previous album in which they looked as if they knew what Black Metal is but didn’t know in what way to play it; they stand strong and firm here, offering a great deal of remarkable material for the thrashy Black Metal audience. Judging by the repetition in the track names (and album’s beginning and ending with the same riff, and most probably with the same grunt), I think this is a concept album, something like Ulver did some time ago. In Fire Metal’s interview with the band, they have stated that the album’s name translates as “Cold As Death, Pale As Dead”. I can think of no better words to describe this album: This is cold and pale Black Metal (probably about the dead). Don’t get me wrong, I am not claiming that the band is 100% original. Well, their originality arises from mixing the best bands’ stuff out there and combining them with their own style and arrangement approaches. It is like tasting an excellently prepared cocktail made up of the best drinks around. This is maybe second to creating the originals, but still something highly worthy, and with a great potential. This album has spirit, pace, aggressiveness, some good musicianship and most important of all, it is flowing, thanks to the melodic riffs that project a dark atmosphere on their own. They are not melodic through usage of keyboards and faggot riffs, but preferring to play something more worthy to listen to than to raise the urge to destroy just like what Marduk and similar bands do. The drumming carry the music all through the album, never giving a sign of lack of energy, neither showing the urge to mess everything up for displays of technicality and to show off. Bass guitar, as in the previous album, gives the strong structure of the back-end on which the whole music stands strong. The tracks have consistency, the instruments have consistency and the album is eventually a compact shredder. Production is at a high standard relative to many Black Metal bands; nothing is too much standing out or seemingly getting lost, but everything is at the level where they should be in order to create a strong and pressurizing sound. If you like those old Norwegian and Swedish Black Metal bands, give it a real, hard listen without underrating the band with prejudices; you will not be disappointed. I’ll give it a beautiful 85, 15 points off to save for the next, for releasing it as mp3 and making such an obsessive man like me go mad because of not being able to know what the fuckin’ lyrics are all about. ----------------- introduce band metal storm forum darkthrone clone meet place darkthorne copycat approach darkthrone copy pity evaluate band song skip forward quickly absolutely darkthrone copy copycat clone band capable riff way energetic vivid darkthrone overall music absolutely way fast catchy unlike previous album look know black metal nt know way play stand strong firm offer great deal remarkable material thrashy black metal audience judge repetition track album begin end riff probably grunt think concept album like ulver time ago fire metal interview band state album translate cold death pale dead think good word describe album cold pale black metal probably dead nt wrong claim band 100 original originality arise mix good band stuff combine style arrangement approach like tasting excellently prepared cocktail good drink maybe second create original highly worthy great potential album spirit pace aggressiveness good musicianship important flow thank melodic riff project dark atmosphere melodic usage keyboard faggot riff prefer play worthy listen raise urge destroy like marduk similar band drum carry music album sign lack energy urge mess display technicality bass guitar previous album strong structure end music stand strong track consistency instrument consistency album eventually compact shredder production high standard relative black metal band stand seemingly lose level order create strong pressurize sound like old norwegian swedish black metal band real hard listen underrate band prejudice disappoint ill beautiful 85 15 point save release mp3 obsessive man like mad able know fuckin lyric
# t-SNE
#
# Notes: https://distill.pub/2016/misread-tsne/
from gensim.models import word2vec
from sklearn.manifold import TSNE
corpus = []
_ = df['review_words'].apply(lambda x: corpus.append(x.split()))
model = word2vec.Word2Vec(corpus, size=300, window=20, min_count=500, workers=16)
labels = []
tokens = []
for word in model.wv.vocab:
tokens.append(model.__getitem__(word))
labels.append(word)
tsne_model = TSNE(perplexity=50, n_components=2, init='pca', n_iter=5000, random_state=23)
new_values = tsne_model.fit_transform(tokens)
x = []
y = []
for value in new_values:
x.append(value[0])
y.append(value[1])
vdf = pd.DataFrame.from_dict({'x': x, 'y': y, 'labels': labels})
p = figure(plot_width=1000, plot_height=500, toolbar_location=None)
p.circle('x', 'y', size=5, color="firebrick", source=vdf, alpha=0.5)
p.add_layout(Title(text="t-SNE Word Map", align="center"), "below")
p.add_tools(HoverTool(tooltips="@labels"))
p.xgrid.visible = False
p.ygrid.visible = False
p.xaxis.visible = False
p.yaxis.visible = False
show(p)
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:14: DeprecationWarning: Call to deprecated `__getitem__` (Method will be removed in 4.0.0, use self.wv.__getitem__() instead).
train_df = df.sample(frac=.90)
dev_df = df.loc[~df.index.isin(train_df.index)]
print(len(dev_df),len(train_df))
10063 90563
tr_df = train_df[['review_lemmas', 'sentiment']]
dv_df = dev_df[['review', 'sentiment']]
train_data = []
for index, row in tr_df.iterrows():
train_data.append((row['review_lemmas'], row['sentiment']))
dev_data = []
for index, row in dv_df.iterrows():
dev_data.append((row['review'], row['sentiment']))
dev_data[0]
('\nI was introduced to this band in Metal Storm forums as a Darkthrone clone, met them again at other places as Darkthorne copycats, and see the same approach here as a Darkthrone copy. It is a pity to evaluate this band over a few songs skipped forward quickly, for they are absolutely no Darkthrone copy, copycat, clone or whatever else a band is capable of becoming. They are much more to it. The riffs are way more energetic and vivid than Darkthrone’s, and the overall music is absolutely way too fast and catchy. Unlike their previous album in which they looked as if they knew what Black Metal is but didn’t know in what way to play it; they stand strong and firm here, offering a great deal of remarkable material for the thrashy Black Metal audience. \n\r\nJudging by the repetition in the track names (and album’s beginning and ending with the same riff, and most probably with the same grunt), I think this is a concept album, something like Ulver did some time ago. In Fire Metal’s interview with the band, they have stated that the album’s name translates as “Cold As Death, Pale As Dead”. I can think of no better words to describe this album: This is cold and pale Black Metal (probably about the dead).\n\r\nDon’t get me wrong, I am not claiming that the band is 100% original. Well, their originality arises from mixing the best bands’ stuff out there and combining them with their own style and arrangement approaches. It is like tasting an excellently prepared cocktail made up of the best drinks around. This is maybe second to creating the originals, but still something highly worthy, and with a great potential.\n\r\nThis album has spirit, pace, aggressiveness, some good musicianship and most important of all, it is flowing, thanks to the melodic riffs that project a dark atmosphere on their own. They are not melodic through usage of keyboards and faggot riffs, but preferring to play something more worthy to listen to than to raise the urge to destroy just like what Marduk and similar bands do. The drumming carry the music all through the album, never giving a sign of lack of energy, neither showing the urge to mess everything up for displays of technicality and to show off. Bass guitar, as in the previous album, gives the strong structure of the back-end on which the whole music stands strong. The tracks have consistency, the instruments have consistency and the album is eventually a compact shredder.\n\r\nProduction is at a high standard relative to many Black Metal bands; nothing is too much standing out or seemingly getting lost, but everything is at the level where they should be in order to create a strong and pressurizing sound.\n\r\nIf you like those old Norwegian and Swedish Black Metal bands, give it a real, hard listen without underrating the band with prejudices; you will not be disappointed. I’ll give it a beautiful 85, 15 points off to save for the next, for releasing it as mp3 and making such an obsessive man like me go mad because of not being able to know what the fuckin’ lyrics are all about.\n',
1)
dev_texts[true_positives[-1]], dev_labels[true_positives[-1]]
('\nIt is a shame that Zyklon B were not around for longer than they were, and it is a shame that they didn\'t release much more material, because this short but sweet EP is absolutely terrific. If you want vicious riffs, grim vocals, and blisteringly fast drumming, this is the EP for you. There may only be a small handful of tracks, but they really pack a punch. Zyklon-B managed to completely master war black metal in only a couple of tracks.\n\nBlood Must Be Shed has a couple of highlights, one fo them being the track it opens up with. This brilliant track, entitled "Mental Orgasm", which features some excellent riffs. The opening riff is abrasive and memorable, being complimented by ultra fast blast beats which sound excellent. As the track evolves, keyboard comes in in the background to compliment the rest of the song. The keyboard may not be as offensive and hard hitting as the rest of the instruments, but it helps tie them together and highlight the melodic undertones. It must be said that although this EP is very aggressive, some of the progressions are awe-inspiring and, if played on violin, would fit right into any classical piece.\n\nWarfare, the third track on this EP, is a real highlight. The melodic keyboards feel very powerful here, and the guitar playing is top notch. I feel that although the riffs, vocals, and drums are all perfect on this track, it would be lacking without the keyboards to add to the grand feeling of this record. One of the best things about this record is that it manages to surpass simply being "brutal" or "hard core and abrasive" and manages to be very interesting musically and very enjoyable to listen to. A sort of majestic chaos is the feeling you get when listening to this release.\n\nFor me, the best track on here is the closing one, a remix of Warfare, entitled Total Warfare. This takes the original, already excellent track, and spices it up a bit by inter-splicing some industrial sections and adding some rising and falling synths in the back, reminiscent of air raid sirens from the second world war. All of what is said about the original track applies here too. The little sections of dialogue are also quite interesting, for example, the whole song stops and an American voice says "For suicide instructions by credit card press 2". I think these are a really good way to highlight the misanthropic message of the band, and also add a little variety.\n\nOverall, I would strongly recommend this to anybody who likes extreme metal. This is rightly regarded as a classic, and should be a must listen for anyone who likes extreme music. The soaring keyboards make it not so difficult to listen to either, as the melodic touch can be somewhat pleasant to hear. One of the greatest releases in the entirety of war black metal.\n',
1)
dev_texts[true_negatives[-1]], dev_labels[true_negatives[-1]]
('\nPerhaps the upwards velocity of the band had been curbed by the year 2006. With two unrelenting hybrids, generally well received slabs of modernist black/death metal attack beneath their belts, what more could the band really offer us? Would they simply turn back towards their mainstays and forget all about this project, or was there something else in the time streams planned for the Zyklon fan...it turns out this was the case, and keeping the same lineup and style intact from the sophomore Aeon, the Norwegian super group would once more emerge from their caverns of creation to try and bludgeon us upside the head once more.\n\r\nDisintegrate is an anomaly to me, because for all purposes, its contents have been very carefully measured and committed to the studio with a lot of superior elements to the past. These are arguably the band\'s busiest compositions, and some will say their best recorded (though a case might be made for World ov Worm\'s less bassy scenario or Aeon\'s turbulent depths). I can\'t say I disagree, as this is the brightest, in your face record of Zyklon\'s career. Yet, for all its strengths, I found Disintegrate to be remarkably.. forgettable in the long run. Almost every song on this album contains 1-2 riffs of value and then a bunch of throwaway matter that feels like a retread of prior songs, which I can definitely live without.\n\r\nFor example, take "Vile Ritual", which has quite a lot of impact and a few excellent rhythms of hyper thrashing/death metal woven throughout, before it lapses into less interesting patterns and ultimately a half-assed thrash breakdown which brings nothing to the table until the solo, which is itself pretty lamentable. "Wrenched" opens with huge, evil old school death rhythm lent atmosphere by Tony\'s vocals, and proceeds through a decent, thick industrial sheen, but then eventually lags off into some dull chug that seems like a slightly less coherent version of "Core Solution" off the second album. Those are actually two of the better songs...because then you\'ve got fare like "Vulture" or the Gorguts-like mesh of "A Cold Grave" which I wouldn\'t remember if you shot me with it.\n\r\nThe lyrics here are actually decent, but when you take into consideration all these simplified song titles like "Underdog", "Vulture", and "Skinned and Endangered", this album feels like its often some soulless attempt at creating a more extreme alternative to Fear Factory, with less of the industrial influence. Gone are such inspirational titles as "Hammer Revelation" or "No Name Above the Names", and one soon casts the impression that this album was far harder on the band\'s limbs than their imaginations. At the same time, it\'s not really something I would dub \'poorly written\', just lacking any durable entertainment value. Surely, the performances are intense, especially in the drumming and shred work, but I can\'t recall a single moment where real excitement or surprise at some blazing, excellent riff transpired within me.\n\r\nI think the large share of my disappointment comes with the missed opportunities here. For all intensive purposes, Zyklon could have become this magnificent beast which merged together the black/death metal and industrial-electronic output to something much more important and widely spread throughout the realm, and yet they\'ve gone down the route of a straight death metal act with only a touch of the black remaining, and a few samples to boot. Perhaps this album\'s title was indicative of the band\'s own mental state with the project, because after its release the band would indeed fold, first delegated to a prolonged hiatus and more recently with the official notice via Samoth.\n\r\n-autothrall\r\nhttp://www.fromthedustreturned.com\n',
0)
dev_texts[false_positives[-1]], dev_labels[false_positives[-1]]
('\nWhile it still can\'t match World ov Worms, Disintegrate makes a concerted effort to rectify many of the pitfalls introduced on the lackluster Aeon, and at the very least ends Zyklon\'s career on a relatively high note. These compositions are stuffed to the brim with multiple shifts in both tone and delivery, being lively without necessarily becoming hectic and chaotic. The production values are also top notch this time around, lending a deservedly massive sonic palette to the proceedings that help amplify the snappy nature of Torson\'s kit alongside the windswept tremolo barrage.\n\r\nSechtdamon even impresses this time around, sparingly radiating a number of disparate vocal styles not limited to his sepulchral death roars. He delivers some of the half-shouting melodic passages that hail back to Daemon\'s fiendish inflection from World ov Worms. While I could easily stack Disintegrate up to Zyklon\'s spectacular debut and highlight all of the reasons it still falls short, that would shortchange the fact that the band wisely discarded the memories of the meandering Aeon and surreptitiously threw away the key. While Zyklon clearly still takes heavy influence from mid-era Morbid Angel and the like, the mixing pot of styles has become even more eclectic, drawing from a multitude of more modern sources and taking great advantage of it.\n\r\nWhile the lack of keyboards still leaves an atmospheric void, part of the cybernetic patina that the band began to move away from has returned along with the boomy, sterile nature of the mix. Samoth\'s tremolo barrage billows forth and suffocates with it\'s burning, buzzsaw tone. The atonal ascending lower-register riffs help fill the remaining gaps as the entire performance meshes into a cohesive assault on the senses. In fact, Zyklon very nearly comes off as a less busy Suffocation during some of the heavier passages of "Wrenched". The solos are also spectacular odes to excess, with the notably chaotic solo on "Vulture" standing out upon first listen.\n\r\nJust like on Aeon, the final track is a slower, more atmospheric romp that hails back to "An Eclectic Manner". While "Skinned and Endangered" is rightfully more measured in it\'s delivery, the atonal harshness of the axes really help them slide into their comfort zone here. This ratchets up the heaviness of Disintegrate and upstages nearly anything the band has released from a purely hostile viewpoint. In fact, the calamitous inhibition to the guitar work reminds me of fellow Norwegians Sarcoma Inc. and their overwhelming barrage of distorted corpulence.\n\r\nTorson continues to upstage his previous performances, upping the ante regarding speed and vivacity on the kit. The poppy snare and organic timbre to the rest of the drum set help sell the appeal of Disintegrate\'s modern sonic palette. While the bass\' presence has taken a notable step back since Aeon, it matters little when the guitars are this well balanced. In fact, Disintegrate is one of the best produced modern death metal records I have ever heard next to Decapitated\'s Organic Hallucinosis. \n\r\nAs such, it remains a shame that Samoth decided to throw in the towel on the entire project following this album. Zyklon was finally on the upswing again, clearly taking the project seriously enough to include a multitude of stimulating and crushing songwriting attributes that set the band apart from the horrendously overcrowded death metal scene. While it still has the proclivity to sound samey at times, Disintegrate ends up embodying what Aeon truly wanted to be, making it a required acquisition for fans of the band after World ov Worms.\n',
0)
dev_texts[false_negatives[-1]], dev_labels[false_negatives[-1]]
("\nI was pleasantly surprised when I approved this band on the Archives, they're an original act with a fun name, a distinctive approach and a great aptitude at mixing genres. Way better than the usual groove or deathcore crap we often deal with!They were classified by a long blurb of terms that rarely go well together, something like ''blues rock and roll black metal'', I decided to change it to ''black metal'' but that's almost oversimplifying things. In their words, the band play ''sleazy n' cheesy bluesy rockin' black metal'' and that's totally accurate. There's not a lot of bands that could play a better rendition of this style.\n\r\nAfter a pointless intro of 1 minute, the first song starts with a bang. With its twelve minutes, ''Skull Shaped Bell'' is perhaps the best song on the release, encompassing the band's genre very well. The band has this jam vibe that's really really nice obviously influenced by acid and psychedelic rock. We can feel the touch of Roky Erickson, Grateful Dead or even Krautrock in Zud. The four ''real'' songs are all lenghty numbers ranging from eight to twelve minutes and they're all interesting.\n\r\nWhen it's more on the metallic spectrum, it reminds me of a black metal Deceased. Especially the vocals of bandleader Justin Curtsinger, the specter of King Fowley is near. It has the influences of the early European black metal scene with nods to Bathory, Mayhem or even Immortal. The production is nice, not too raw and not overproduced. The vocals are rightfully placed and the riffs are natural and warm. You can hear the experience of these guys, they're definitely veterans of the Maine's metal scene and they really knew how to achieve the sound they wanted for this album. The band intertwines their primitive brand of black metal with some very high class clean leads and rock parts. All the solos and leads are quite good, it has this rock sense of melody that will please the fans of a more melodic and less abrasive sort of black metal. It's akin to Darkthrone's punk approach on albums like The Cult is Alive. Even though their transitions from heavy to soft remind me of Opeth, it's very well mixed and enjoyable. Still if you're a black metal cultist, you probably won't like their transgressions into rock territory. \n\r\nIt's not a fast band even when they're at their heaviest, they're pretty laid back and atmospheric. Not in a ''we love nature soooooo much'' way like state brothers Falls of Rauros or the whole Cascadia movement but in a old school charming way. Comparable to the dark but romantic aura of The Chasm. Evolving from within their influences, Zud is an original band who isn't pushing the boundaries because of a so-called will to transcend musicality. They play their music with an honest blend of Americana, blues, rock and roll and all the good things my dad tried to push on me when I was a kid. Thanks dad.\n\r\nGood melodies, tasty tremolo riffs, interesting leads and cool understandable harsh vocals bordering on death metal are good aspects of this album. While the songs could necessitate some slimming down and the tempos could be a bit more varied, the band is a band to discover. They're like a rockier slower version of Midnight, fun stuff, really.\n\nMetantoine's Magickal Realm\n",
1)
html = file_html(p, CDN, "tsne-plot")
with open('data/tsne-plot.html', 'w') as f:
f.write(html)
df10 = df.sample(frac=.1)
df90 = df[~df.index.isin(df10.index)]
data = {}
labels = {}
for data_type in ["test", "train"]:
data[data_type] = {}
labels[data_type] = {}
for sentiment in ['pos', 'neg']:
data[data_type][sentiment] = []
labels[data_type][sentiment] = []
for i in df10.index:
if df10.loc[i, "sentiment"] == 1:
data["test"]["pos"].append(df10.loc[i, "review"])
else:
data["test"]["neg"].append(df10.loc[i, "review"])
labels["test"]["pos"] = [1] * len(data["test"]["pos"])
labels["test"]["neg"] = [0] * len(data["test"]["neg"])
print("Positive test reviews: {0}, Negative test reviews: {1}". format(len(data["test"]["pos"]), len(data["test"]["neg"])))
Positive test reviews: 6106, Negative test reviews: 3957
for i in df90.index:
if df90.loc[i, "sentiment"] == 1:
data["train"]["pos"].append(df90.loc[i, "review"])
else:
data["train"]["neg"].append(df90.loc[i, "review"])
labels["train"]["pos"] = [1] * len(data["train"]["pos"])
labels["train"]["neg"] = [0] * len(data["train"]["neg"])
print("Positive training reviews: {0}, Negative training reviews: {1}". format(len(data["train"]["pos"]), len(data["train"]["neg"])))
Positive training reviews: 54503, Negative training reviews: 36060
def prepare_ma_data(data):
"""Prepare training and test sets from MA reviews."""
data_train = data['train']['pos']+data['train']['neg']
data_test = data['test']['pos']+data['test']['neg']
labels_train = [1]*len(data['train']['pos'])+[0]*len(data['train']['neg'])
labels_test = [1]*len(data['test']['pos'])+[0]*len(data['test']['neg'])
data_train, labels_train = shuffle(data_train, labels_train)
data_test, labels_test = shuffle(data_test, labels_test)
# Return a unified training data, test data, training labels, test labets
return data_train, data_test, labels_train, labels_test
data_train, data_test, labels_train, labels_test = prepare_ma_data(data)
print("MA reviews (combined): train = {}, test = {}".format(len(data_train), len(data_test)))
MA reviews (combined): train = 90563, test = 10063
words_train = df90[df90.sentiment == 1]['review_lemmas'].values.tolist() + df90[df90.sentiment == 0]['review_lemmas'].values.tolist()
words_test = df10[df10.sentiment == 1]['review_lemmas'].values.tolist() + df10[df10.sentiment == 0]['review_lemmas'].values.tolist()
vectorizer = CountVectorizer(preprocessor = lambda x:x, tokenizer = lambda x: x.split())
features_train = vectorizer.fit_transform(words_train)
features_test = vectorizer.transform(words_test)
features_train = pr.normalize(features_train)
features_test = pr.normalize(features_test)
for clf in [MultinomialNB(), LinearSVC()]:
clf.fit(features_train, labels_train)
print("[{}] Accuracy: train = {}, test = {}".format(
clf.__class__.__name__,
clf.score(features_train, labels_train),
clf.score(features_test, labels_test)))
[MultinomialNB] Accuracy: train = 0.6018241445181807, test = 0.6067773029911557 [LinearSVC] Accuracy: train = 0.720526042644347, test = 0.5620590281228262
from __future__ import unicode_literals, print_function
import plac
import random
from pathlib import Path
import spacy
from spacy.util import minibatch, compounding
@plac.annotations(
model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
output_dir=("Optional output directory", "option", "o", Path),
n_texts=("Number of texts to train from", "option", "t", int),
n_iter=("Number of training iterations", "option", "n", int))
def train_model(model=None, output_dir=None, n_iter=20, n_texts=2000):
if model is not None:
nlp = spacy.load(model) # load existing spaCy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank('en') # create blank Language class
print("Created blank 'en' model")
# add the text classifier to the pipeline if it doesn't exist
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'textcat' not in nlp.pipe_names:
textcat = nlp.create_pipe('textcat')
nlp.add_pipe(textcat, last=True)
# otherwise, get it, so we can add labels to it
else:
textcat = nlp.get_pipe('textcat')
# add label to text classifier
textcat.add_label('POSITIVE')
print("Loading The Metal-Archives data...")
(train_texts, train_cats), (dev_texts, dev_cats) = load_data(limit=n_texts)
print("Using {} examples ({} training, {} evaluation)"
.format(n_texts, len(train_texts), len(dev_texts)))
train_data = list(zip(train_texts,
[{'cats': cats} for cats in train_cats]))
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes): # only train textcat
optimizer = nlp.begin_training()
print("Training the model...")
print('{:^5}\t{:^5}\t{:^5}\t{:^5}'.format('LOSS', 'P', 'R', 'F'))
for i in range(n_iter):
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(train_data, size=compounding(4., 32., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
losses=losses)
with textcat.model.use_params(optimizer.averages):
# evaluate on the dev data split off in load_data()
scores = evaluate(nlp.tokenizer, textcat, dev_texts, dev_cats)
print('{0:.3f}\t{1:.3f}\t{2:.3f}\t{3:.3f}' # print a simple table
.format(losses['textcat'], scores['textcat_p'],
scores['textcat_r'], scores['textcat_f']))
if output_dir is not None:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
with nlp.use_params(optimizer.averages):
nlp.to_disk(output_dir)
print("Saved model to", output_dir)
return nlp
def load_data(limit=0, split=0.8):
"""Load data from the IMDB dataset."""
# Partition off part of the train data for evaluation
global train_data
random.shuffle(train_data)
train_data = train_data[-limit:]
texts, labels = zip(*train_data)
cats = [{'POSITIVE': bool(y)} for y in labels]
split = int(len(train_data) * split)
return (texts[:split], cats[:split]), (texts[split:], cats[split:])
def evaluate(tokenizer, textcat, texts, cats):
docs = (tokenizer(text) for text in texts)
tp = 0.0 # True positives
fp = 1e-8 # False positives
fn = 1e-8 # False negatives
tn = 0.0 # True negatives
for i, doc in enumerate(textcat.pipe(docs)):
gold = cats[i]
for label, score in doc.cats.items():
if label not in gold:
continue
if score >= 0.5 and gold[label] >= 0.5:
tp += 1.
elif score >= 0.5 and gold[label] < 0.5:
fp += 1.
elif score < 0.5 and gold[label] < 0.5:
tn += 1
elif score < 0.5 and gold[label] >= 0.5:
fn += 1
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f_score = 2 * (precision * recall) / (precision + recall)
return {'textcat_p': precision, 'textcat_r': recall, 'textcat_f': f_score}
tnlp = train_model(output_dir='data/ma_model1', n_iter=20, n_texts=len(train_data))
Created blank 'en' model Loading The Metal-Archives data... Using 90563 examples (72450 training, 18113 evaluation) Warning: Unnamed vectors -- this won't allow multiple vectors models to be loaded. (Shape: (0, 0)) Training the model... LOSS P R F 514.431 0.839 0.899 0.868 369.110 0.846 0.889 0.867 311.449 0.847 0.881 0.864 276.345 0.847 0.878 0.862 255.272 0.847 0.875 0.861 242.119 0.848 0.874 0.861 233.409 0.846 0.874 0.860 226.124 0.847 0.872 0.860 221.353 0.849 0.872 0.860 217.129 0.848 0.871 0.859 216.234 0.848 0.870 0.859 210.670 0.847 0.870 0.858 209.512 0.847 0.869 0.858 213.043 0.847 0.868 0.857 207.692 0.848 0.868 0.858 208.932 0.847 0.869 0.858 207.611 0.848 0.869 0.858 208.336 0.848 0.869 0.858 205.264 0.847 0.869 0.858 209.148 0.847 0.868 0.858 Saved model to data/ma_model1
mdir = 'data/ma_model1'
print("Loading model {0}".format(mdir))
tnlp = spacy.load(mdir)
print("Evaluating model...")
i = 0
correct = 0
false_positives = []
false_negatives = []
true_positives = []
true_negatives = []
predictions = []
texts, labels = zip(*train_data)
for doc in tnlp.pipe(texts, batch_size=1000, n_threads=4):
predict = bool(doc.cats['POSITIVE'] >= 0.5)
predictions.append(int(predict))
if predict == bool(labels[i]):
correct += 1
if predict:
true_positives.append(i)
else:
true_negatives.append(i)
else:
if predict:
false_positives.append(i)
else:
false_negatives.append(i)
i += 1
print(float(correct)/i)
print("True Positives: {0}".format(len(true_positives)))
print("False Positives: {0}".format(len(false_positives)))
print("True Negatives: {0}".format(len(true_negatives)))
print("False Negatives: {0}".format(len(false_negatives)))
print("Total documents classified: {0}".format(i))
Loading model data/ma_model1 Evaluating model... 0.9160694764970242 True Positives: 51242 False Positives: 4340 True Negatives: 31720 False Negatives: 3261 Total documents classified: 90563
dev_data[0]
("\nSweet jesus there's a lot of brutal death metal bands out there. I know it's always been a very popular subgenre, but damned if there doesn't seem to have been a ridiculous increase in the number of such bands over the past couple years. Hailing from Edinburgh, Scotland, Sons Of Slaughter is another one of these bands, laden with enough blasts and tremolos and grrs to please most brutal death fans. Unfortunately, their debut release 'The Extermination Strain' isn't able to do much more than echo other such releases.\n\r\nAfter a brief ambient intro (apparently a legal necessity for all metal albums these days), the album kicks off with 'Lead Us Not', a derivative, though enjoyable, slab of brutal death. It's tight and percussive, moving quickly from riff to riff and rhythm to rhythm, maintaining a solid balance of old-school Suffocation emulation with the modern sound we all know and (supposedly) love. The song grinds along appropriately, and leaves the listener satisfied, though not blown away. Really, there's nothing to dislike about such tracks.\n\r\nExcept for the fact that the same track repeated over a half hour gets rather tiresome, to be frank. Minus the brief clean guitar interlude that is 'Between To Suns', 'The Extermination Strain' proceeds to simply bash out one track after another with little variation to tell them apart. While none of the songs are bad, such strict adherence to formula makes the album a pretty tedious listen for all those except totally dire brutal death metal fans who genuinely can't get enough of such sounds. However, most of us are going to be restless around 'Embedded', when one realizes that the LP probably isn't going to change any time soon. Or ever, for that matter.\n\r\nThe Sons Of Slaughter are capable musicians and songwriters, but currently lack the unique flair to make them much more than a drop in a very large and perpetually increasing bucket. Perhaps the sophomore release will be an improvement, but for now, 'The Extermination Strain' is only recommended to die-hard brutal death metal fanatics. The rest can pass.\n\r\n(Originally written for www.vampire-magazine.com)\n",
0)
mdir = 'data/ma_model1'
tnlp = spacy.load(mdir)
doc = tnlp(df.iloc[0]['review'])
doc.cats['POSITIVE']
0.9868379235267639
mdir = 'data/ma_model1'
print("Loading model {0}".format(mdir))
tnlp = spacy.load(mdir)
print("Evaluating model...")
i = 0
correct = 0
false_positives = []
false_negatives = []
true_positives = []
true_negatives = []
predictions = []
texts, labels = zip(*dev_data)
for doc in tnlp.pipe(texts, batch_size=1000, n_threads=4):
predict = bool(doc.cats['POSITIVE'] >= 0.5)
predictions.append(int(predict))
if predict == bool(labels[i]):
correct += 1
if predict:
true_positives.append(i)
else:
true_negatives.append(i)
else:
if predict:
false_positives.append(i)
else:
false_negatives.append(i)
i += 1
print(float(correct)/i)
print("True Positives: {0}".format(len(true_positives)))
print("False Positives: {0}".format(len(false_positives)))
print("True Negatives: {0}".format(len(true_negatives)))
print("False Negatives: {0}".format(len(false_negatives)))
print("Total documents classified: {0}".format(i))
Loading model data/ma_model1 Evaluating model... 0.9150352777501739 True Positives: 5744 False Positives: 493 True Negatives: 3464 False Negatives: 362 Total documents classified: 10063
cm = confusion_matrix(dev_labels, predictions)
class_names = ['neg', 'pos']
df_cm = pd.DataFrame(cm, index=class_names, columns=class_names)
fig = plt.figure(figsize=(15,10))
try:
heatmap = sns.heatmap(df_cm, annot=True, fmt="d")
except ValueError:
raise ValueError("Confusion matrix values must be integers.")
heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right', fontsize=14)
heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right', fontsize=14)
plt.ylabel('Ground Truth label')
plt.xlabel('Predicted label')
plt.show()
heatmap.get_figure().savefig("data/confusion.png")
def create_tuple(review, sentiment, lst):
lst.append((review, sentiment))
train_data = []
_ = df90.apply(lambda row: create_tuple(row['review'], row['sentiment'], train_data), axis=1)
dev_data = []
_ = df10.apply(lambda row: create_tuple(row['review'], row['sentiment'], dev_data), axis=1)
import plac
import random
import pathlib
import cytoolz
import numpy
from keras.models import Sequential, model_from_json
from keras.layers import LSTM, Dense, Embedding, Bidirectional
from keras.layers import TimeDistributed
from keras.optimizers import Adam
import thinc.extra.datasets
from spacy.compat import pickle
class SentimentAnalyser(object):
@classmethod
def load(cls, path, nlp, max_length=100):
with open(path / 'config.json') as file_:
model = model_from_json(file_.read())
with open(path / 'model', 'rb') as file_:
lstm_weights = pickle.load(file_)
embeddings = get_embeddings(nlp.vocab)
model.set_weights([embeddings] + lstm_weights)
return cls(model, max_length=max_length)
def __init__(self, model, max_length=100):
self._model = model
self.max_length = max_length
def __call__(self, doc):
X = get_features([doc], self.max_length)
y = self._model.predict(X)
self.set_sentiment(doc, y)
def pipe(self, docs, batch_size=1000, n_threads=2):
for minibatch in cytoolz.partition_all(batch_size, docs):
minibatch = list(minibatch)
sentences = []
for doc in minibatch:
sentences.extend(doc.sents)
Xs = get_features(sentences, self.max_length)
ys = self._model.predict(Xs)
for sent, label in zip(sentences, ys):
sent.doc.sentiment += label - 0.5
for doc in minibatch:
yield doc
def set_sentiment(self, doc, y):
doc.sentiment = float(y[0])
# Sentiment has a native slot for a single float.
# For arbitrary data storage, there's:
# doc.user_data['my_data'] = y
def get_labelled_sentences(docs, doc_labels):
labels = []
sentences = []
for doc, y in zip(docs, doc_labels):
for sent in doc.sents:
sentences.append(sent)
labels.append(y)
return sentences, numpy.asarray(labels, dtype='int32')
def get_features(docs, max_length):
docs = list(docs)
Xs = numpy.zeros((len(docs), max_length), dtype='int32')
for i, doc in enumerate(docs):
j = 0
for token in doc:
vector_id = token.vocab.vectors.find(key=token.orth)
if vector_id >= 0:
Xs[i, j] = vector_id
else:
Xs[i, j] = 0
j += 1
if j >= max_length:
break
return Xs
def train(train_texts, train_labels, dev_texts, dev_labels,
lstm_shape, lstm_settings, lstm_optimizer, batch_size=100,
nb_epoch=5, by_sentence=True):
print("Loading spaCy")
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(nlp.create_pipe('sentencizer'))
embeddings = get_embeddings(nlp.vocab)
model = compile_lstm(embeddings, lstm_shape, lstm_settings)
print("Parsing texts...")
train_docs = list(nlp.pipe(train_texts))
dev_docs = list(nlp.pipe(dev_texts))
if by_sentence:
train_docs, train_labels = get_labelled_sentences(train_docs, train_labels)
dev_docs, dev_labels = get_labelled_sentences(dev_docs, dev_labels)
train_X = get_features(train_docs, lstm_shape['max_length'])
dev_X = get_features(dev_docs, lstm_shape['max_length'])
model.fit(train_X, train_labels, validation_data=(dev_X, dev_labels),
epochs=nb_epoch, batch_size=batch_size)
return model
def compile_lstm(embeddings, shape, settings):
model = Sequential()
model.add(
Embedding(
embeddings.shape[0],
embeddings.shape[1],
input_length=shape['max_length'],
trainable=False,
weights=[embeddings],
mask_zero=True
)
)
model.add(TimeDistributed(Dense(shape['nr_hidden'], use_bias=False)))
model.add(Bidirectional(LSTM(shape['nr_hidden'],
recurrent_dropout=settings['dropout'],
dropout=settings['dropout'])))
model.add(Dense(shape['nr_class'], activation='sigmoid'))
model.compile(optimizer=Adam(lr=settings['lr']), loss='binary_crossentropy', metrics=['accuracy'])
return model
def get_embeddings(vocab):
return vocab.vectors.data
def evaluate(model_dir, texts, labels, max_length=100):
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(nlp.create_pipe('sentencizer'))
print("Loading model...")
nlp.add_pipe(SentimentAnalyser.load(model_dir, nlp, max_length=max_length))
correct = 0
i = 0
print("Evaluating model...")
for doc in nlp.pipe(texts, batch_size=1000, n_threads=4):
correct += bool(doc.sentiment >= 0.5) == bool(labels[i])
i += 1
return float(correct) / i
def read_data(data_dir, limit=0):
examples = []
for subdir, label in (('pos', 1), ('neg', 0)):
for filename in (data_dir / subdir).iterdir():
with filename.open() as file_:
text = file_.read()
examples.append((text, label))
random.shuffle(examples)
if limit >= 1:
examples = examples[:limit]
return zip(*examples) # Unzips into two list
@plac.annotations(
train_dir=("Location of training file or directory"),
dev_dir=("Location of development file or directory"),
model_dir=("Location of output model directory",),
is_runtime=("Demonstrate run-time usage", "flag", "r", bool),
nr_hidden=("Number of hidden units", "option", "H", int),
max_length=("Maximum sentence length", "option", "L", int),
dropout=("Dropout", "option", "d", float),
learn_rate=("Learn rate", "option", "e", float),
nb_epoch=("Number of training epochs", "option", "i", int),
batch_size=("Size of minibatches for training LSTM", "option", "b", int),
nr_examples=("Limit to N examples", "option", "n", int)
)
def sentiment_analyzer(model_dir=None, train_dir=None, dev_dir=None,
is_runtime=False,
nr_hidden=64, max_length=100, # Shape
dropout=0.5, learn_rate=0.001, # General NN config
nb_epoch=10, batch_size=256, nr_examples=-1): # Training params
if model_dir is not None:
model_dir = pathlib.Path(model_dir)
if is_runtime:
if dev_dir is None:
dev_texts, dev_labels = zip(*dev_data)
else:
dev_texts, dev_labels = read_data(dev_dir)
acc = evaluate(model_dir, dev_texts, dev_labels, max_length=max_length)
print(acc)
else:
if train_dir is None:
random.shuffle(train_data)
train_texts, train_labels = zip(*train_data)
else:
print("Read data")
train_texts, train_labels = read_data(train_dir, limit=nr_examples)
if dev_dir is None:
random.shuffle(dev_data)
dev_texts, dev_labels = zip(*dev_data)
else:
dev_texts, dev_labels = read_data(dev_dir, limit=nr_examples)
train_labels = numpy.asarray(train_labels, dtype='int32')
dev_labels = numpy.asarray(dev_labels, dtype='int32')
lstm = train(train_texts, train_labels, dev_texts, dev_labels,
{'nr_hidden': nr_hidden, 'max_length': max_length, 'nr_class': 1},
{'dropout': dropout, 'lr': learn_rate},
{},
nb_epoch=nb_epoch, batch_size=batch_size)
weights = lstm.get_weights()
if model_dir is not None:
with (model_dir / 'model').open('wb') as file_:
pickle.dump(weights[1:], file_)
with (model_dir / 'config.json').open('w') as file_:
file_.write(lstm.to_json())
sentiment_analyzer(model_dir='data/ma_model_h', is_runtime=True)
Loading model... Evaluating model... 0.8065189307363609